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© Digital signal processor. 

© The present invention improves a digital signal processor, more particularly, calculation methods for motion 
compensation in reduceing a required amount of calculations when an amount of distortion between a last frame 
block and a current frame block; in processing a direct memory access at a higher efficiency; in processing a 

-j subdivided data calculation at a higher speed; in processing a branch instruction occurring in the pipeline 

2 process at a higher efficiency; and in processing an interruption occurring in a repeat process operation at 
greater convenience, and furthermore in reducing a required amount of calculations through m.n.mum distortion 

qJ searching processes hierarchized. 
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DIGITAL SIGNAL PROCESSOR 

BACKGROUND OF THE INVENTION 
Field of the Invention 

5 The present invention relates to a digital signal processor capable of performing an arithmetic 

processing of mainly a signal series. 

jo Description of the Prior Art 

Pin 1 i* a schematic block diagram of an arrangement of a first conventional digital signal processor 
which as IX£££™&* FLOATING-POINT SIGNAL PROCESSOR VL^OI JCASSP 86, 
1986 It should be noted that for the sake of simplicity, only requ.red blocks are .lbs rated ,n F* 1 
« in Fin 1 reference numeral 1 indicates an instruction memory for stonng an instruction word; 2 denotes 
a prl m £££ZZm an address of the instruction memory 1 to an output path 51; 3 represents 
JZZ^eluricn control "nit for decoding the instruction word suppiied from the mstruction memory 
• Tvia^n oZt path 52, and for outputting a control signal viaan output path 53 to the program counter Z a 
Jculation unit and the .ike; 4 is an interna, data memory for storing ^on^ B 
bus for transferring data read out from the internal • data memory 4 via the output path 54, 6a denotes a 
^^S^Ziing multiplication on input data supplied from the data bus 5 via ar , outpu p*h .55; 
TinSes an accumulator for performing an accumulating operation; 8 represents an accumulating register 
jto^ l ^Son result; preference numeral 9 indicates a repeat counter for repeatmg the 

"TSESS Xncrnumera. 63 indicates an input/output path for connect the repeat counter 9 
and the date bus 5; 64 represents a selector for inputting the data which has been suppl.ec v,a he output 
pa* 56 from the multiplierunit 6a. and the data which has been supplied from the data bus 5 v.a the output 
oath 57 thereinto and for supplying output data via the output pth 58 to the accumulator 65 denotes i 
setecfor output data which has been supplied from the data bus 5 and the ou put ^ data 

^S^bSZ^M from the accumulating register 8 therein, and for supplying the output data v,a he 
X^T^^uWDr; and reference numera, 66 is an output path for transm.tt.ng a control 

S CSc,rr v ^ digita. signal processor wi.l now be described. In response .to *e 
address outout from the program counter 2 via the output path 51. the instruct.cn word read from the 
3S l C ri 7l is input via the output path 52 to the instruction execution control un,t 3. Based upon 
Z7*ri™™Zc«oo. the instruction execudon control unit 3 contro.s the operates by send.ng the 

rnntrni sional via the output path 53 to various sections. 

C ° ™e Ze2?6l memory 4 reads at most two pieces of data to the data bus 5 via the output path £ 
and the multiplier 6a outputs the multiplication results with respect to two pieces of nput data wh.ch has 



20 



25 



30 



45 



50 



„,„ B-^STiTa to the output data which has been supplied Iran the selector 65 v,a the output path 
2 '^cuSn^esut. 0. ^'accumulator 7 is «t«eo via the output pah 62 ,nto the a=cumu,a,n 9 

^T, should be noted thai the same instruction such as the abov*descnbed accumulation is carried out in 
' J'jE^ES abidance »» the output d*. which has b« supplied from the dale bus 5 »,a me 
inputs pa* ea,;;. number <^^^^^£^S^ .or «P*nin 9 such an 
JZ^SEtZfttySttZ* to a block "A" of a certain data sehes. 
a deS< riong^r* blocks of m in number as shown in a dab, rotations.*, d^oram o. Fig. 2. 
An amount of distortion is calculated by equation 1: 
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dk ■ L "m, - v 2 .li 

where, 

the block A is: x = [ xi, x 2 .... x w ] 

the search blocks are: y k = [ y k1)yk2 y kw ] 

k = 1 - M 

"M" and "W" are fixed integers. 

That is to say, with respect to the output data of x h , y lh which have been read from the data memory 4 
of the respective blocks, the accumulating calculations are performed by the number of the data (steps ST 
11. ST 12), the distortion comparison is performed afte M numbers of the respective block's distortions are 
obtained, and thereafter a minimum distortion and a block number thereof are obtained (step ST 13). 

In this case, the digital signal processor having the arrangement shown in Fig. 1 requires both the 
comparison and update process by "M" times in order to perform a sum-of-product calculation within one 
machine cycle, where an amount of calculation becomes (W x M) times for the sum-of-product process, 
and furthermore M times for both the minimum distortion and the block number thereof are needed. As a 
result, a processing time required for the calculations becomes tx(MxW + M), where t is one machine 
cycle. 

Since the conventional digital signal processor has been arranged with the above-described construc- 
tions, when, for instance, a block having a minimum distortion is detected among blocks having a certain 
data series and "M" pieces of search blocks, distortions for all of "M" pieces of blocks are calculated, these 
distortions are compared with each other, and then a block number (position) of a minimum distortion is 
detected. As a result, there are drawbacks that an amount of calculations becomes very large and a 
required processing time is considerably long. 

Fig. 5 is a schematic block diagram of the digital signal processing processor disclosed in "A 50nS 
FLOATING-POINT SIGNAL PROCESSOR VLSI", P.401, Proceedings of ICASSP 86, 1986. It should be 
noted that for the sake of simplicity, only necessary blocks are shown in Fig. 5. 

In the block diagram of Fig. 5, reference numeral 1 denotes an instruction memory for storing an 
instruction word; 3 indicates an instruction execution control unit for controlling various operations of 
decoding the instruction word and calculations; 5 is a data bus for mutually connecting the following 
sections with each other and for mainly performing a data transmission; 4 is an internal data memory for 
storing the calculation data; 6 represents a calculating unit for performing various calculations with respect 
to two pieces of data which have been transferred from the data bus 5; 8 denotes an address generating 
unit capable of generating at most 3 addresses at the same time; 10 represents an external data memory 
connecting unit for controlling the read/write operations to an external data memory not shown); 78 is an 
external address bus; 79 denotes an external data bus; 80 indicates an external device control signal bus; 
81 is a serial port referred to as an "SIO" hereinafter) for performing a serial data transmission between 
external devices (not shown in detail); and, reference numeral 82 denotes a direct data memory transfer 
control unit (referred to as a "DMAC* hereinafter) for controlling a direct data memory transfer (referred to 
as a "DMA" hereinafter) between SIO 81 and external data memory connecting unit 10. 

Fig. 6 illustrates a timing chart of external data memory accessing operations of the digital signal 
processor shown in Fig. 5. Fig. 6a is a read timing chart and Fig. 6b is a write timing chart. In Figs. 6a and 
6b, reference numeral 291 is an external address terminal; 292 represents a strobe signal for controlling the 
read timing supplied from the external data memory; 293 is an external data terminal; and, 294 represents a 
strobe signal for controlling write timing to the external data memory. 

An operation of the digital signal processor will now be described. In Fig. 5, the instruction word of the 
designated address is read out from the instruction memory 1, and input via an input/output path 201 to the 
instruction execution control unit 3. The control signal and data which have been decoded by the instruction 
execution control unit 3 are transferred via an output path 202 to the data bus 5. 

In response to this control signal, calculation data from the internal data memory 4 to the data bus 5 is 
read via an output path 203, the data from the data bus 5 is input via an output path 204 to the calculation 
unit 6, the calculating process and calculation result at the calculation unit 6 is output via an output path 205 
to the data bus 5, the data sent from the data bus 5 to the internal data memory 4 is written via an output 
path 206, and various operations such as the external data memory access are controlled. 

Both the address of the input data from the internal data memory 4 to the calculation unit 6 and the 
writing address of the output data from the calculation unit 6 to the internal data memory 4 are controlled by 



EP 0 373 291 A2 



»ha address aeneratina unit 8 having three systems of address generators. This address generating unit 8 
ZSSr^SS 25 the readable/writable data input from the data bus 5 via an input/output path 
ST^ttaSJ-l data memory 4 and the external data memory connection umt 10 -n response to 
fhe date ^ch has been output via output paths 208 and 209. and determines the mput data and output 

data wtn rrir specific register of DMAC 82 via the data bus 5 and a path 

^S^SJS^* of operations other than the DMA transfer are jnterrupted, and the data 

transfer is carried out from SIO 81 to the external data memory connection unit 10 via the output path 208 
B an T^ZT^Z** word number is set into the specific register of DMAC 82 in response tc > e 

instruction which has been previously output via the output path 201 . As the settab.e transfer word numbers. 

a selection is made to only 64, 128, 256 and 512 words. 

A description will now be made to Fig. 6. When the readout operation of the external data memory .s 

carried Ss shown in F.g. 6a. an RE terminal of the external device control signal bus 80 becomes active 
5 S machine cyc7e me stobe signal 292 informs the externa, device of the data readout, and the address 

data is output from the external address bus 78 for 1 machine cycle. Furthermore, the data read from the 

external device is fetched at the trailing edge of the same cycle. 

When the writing operation of the external data memory as shown ,n F.g. 6b is earned out, a WE 

XerZToUte externa, device control signal bus 80 becomes active for 1 machine cycle the , data wr ting 
, 0 c^rlls announced to the externa, device, the address data is output from the externa, address bus 78 

and the write data is output from the external data bus 79 for one machine cycle. fftllrtlA)inn 
Since the second conventional digital signal processor is arranged as desenbed above, the fo.low.ng 

problems exfct: ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ memory and external data 

25 memory the processing efficiency of the internal calculation is lowered. ♦ ^ racenfihl > 
b When the externai data memory is accessed by way of the direct data transfer he add es * th 
external data memory is simple increasing sequence and the trans er wore 1 number cannot be arbitarHy 

— s-rr -~ - — is 

" SSKJ. of the externa, data 

T ^ ^emtic b.oc k diagram of the conventiona. digital signal 
35 SiX in F,g. 7. ,n Fig. 7. reference ^^JZZ 

it^ 

^ Ye program memory 1. reading of data, emulation, and writing of cafcuia Jon resu, f. 4 rcpre nt 

Si £ 

?es£Z selectors 301 and 302 to the muftiplier circuit 303 or calculation unjte as the data X and nr. A 
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pieces of data X and Y, and also supplies the resultant data to the data memory 4 via the data bus 5 for 
writing. The above-described series of processing operations are performed by such a pipeline process that 
the control circuit 3 reads the microprogram which has been stored in the program memory 1, the 
instruction is decoded by the control circuit 3, and the control signal 31 is output to the respective circuits. 
5 Then, in case that a sum-of-product calculation, a complex number calculation, and a binary three 
search vector quantizing calculation are executed in such a DSP, descriptions of a required machine cycle 
number will now made. 



10 (1 ) A sum-of-product calculation. 

Fig. 8 shows a calculation flow of a sum-of-product calculation. That is, at first, in a step ST 21, an 
initialization is executed. Namely, an address for the data memory 4 is set. and a loop number is set in the 
multiplier circuit 303 and calculation unit 6. Then, in a step ST 22, the sum-of-product calculation is 
75 performed in one machine cycle. In a next step ST 23, a decision process is made whether or not a count 
value of the repeat counter is equal to zero. In other words, a decision process whether or not the repeat 
calculations are executed M times which have been set in the previous initialization step, has been 
performed. 

In this case, if the calculation result of the sum-of-product calculation output from the calculation unit 6 
20 is assumed to be n Z n , this n Z" will be expressed as follows: 

M 

z = I < x i x y±> ..... (2) 

1=1 

25 

It should be noted that input data series X and Y are defined by: 
X = (xi , — , x n ), and 
Y = (y,,~, y„). 

30 Since two pieces of data read from the data memory 4, multiplication, and accumulation of the 
multiplied results are pipeline-processed, an amount of required calculations becomes M machine cycles 
per one output data when the loop numbers "M" are sufficiently great. Thus, this is the same in the case 
that the data size is equal to "n" bits. 



(2). Complex number calculation. 

Fig. 9 illustrates a calculation flow of a complex number calculation. That is to say, in a step ST 31, an 
initialization is carried out similar to the above-described step ST 21 . In a subsequent step ST 32, and next 
^ step ST 33, a calculation on a real number part and a calculation on an imaginary number part are 
separately executed in two machine cycles respectively. In a next step ST 34, a decision is made whether 
or not the count value of the loop counter is equal to zero. In other words, a decision is made whether or 
not the calculations have been performed M times which have been set in the initialization. 

In this case, if the input data X and Y are set to X = ai + ja2, Y = bi + jb2, respectively, a 
45 multiplication between these complex numbers X and Y is as follows: 
X x Y = (ai x bi - a 2 x b 2 ) + j(ai x b 2 + a 2 x bi ) (3) 

As a result, the calculations on the real number part and imaginary number part are executed in the two 
steps of ST 32 and ST 33. Accordingly, an amount of required calculation becomes five machine cycles per 
one output data. 



' (3). Binary tree search vector quantizing calculation. 

Fig. 10 represents a calculation flow for explaining a binary tree search vector quantizing calculation. 
The function of this binary tree search is to perform a matching calculation between an input vector "x", 
and two output vectors "yo" and "yi at a certain search stage so as to detect an output vector containing a 
smaller matching distortion, and is to repeat such a matching calculation operation on two output vectors 
located at a stage below the detected vectors. 
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As the above-described matching calculation, a vector inner product is utilized. Assuming that an 
element number of a vector is "k", a matching distortion quantity is defined as follows: 



d o = x,y o *l (x i x y oi> •••• (4) 

1= 1 
k 

d l = X ' y l ".E (X 1 X y ll ) {5) 



where 

x = Xi x , 

yo = yoi yo 

yi = yn yi 



As a consequence, at steps ST 42 and 43, "do" and "d, " are calculated. In the subsequent step ST 44. 
a comparison is made between "do" and "d,-. Then, the process is advanced to the subsequent process. 
Accordingly, an amount of required calculation per one stage is equal to (2k + 5) machine cycles. 

Since the third conventional digital signal processor is arranged as described above, even in case that 
the required data precision is enough of a half of a data size at its maximum, an amount of various 
calculations required is equal to that of the data precision with respect to the data size at its maximum. As a 
result the calculation capabilities of the digital signal processor per se cannot be sufficiently utilized. 

Pig 11 is a schematic block diagram of the conventional digital signal processor (referred to as a 
"DSP" hereinafter) disclosed in, for instances "A 50nS FLOATING POINT SIGNAL PROCESSOR VLSI , on 
page 401. IEEE. ICASSP86. It should be noted that for the sake of simplicity, only necessary blocks are 

represented in Fig. 11. .. , 

In DSP shown in Fig. 11. reference numeral 1 indicates a program memory; 3 is a control circuit for 
controlling data transfer, calculation, branching and so on; 31 represents an output path for outputbng a 
control signal from the control circuit 3; 404 indicates an output path from the control circuit 3 to the 
program memory 1; 405 is an output path from the program memory 1 to the control circuit 3; 4 denotes a 
data memory; 6 indicates a calculation unit including a multiplier, an arithmetic calculator, a shifter an 
accumulator and so on; 5 is a data bus; 409 represents output paths from the data memory 4 to the data 
bus 5 and from the data bus 5 to the calculation unit 6; and. reference numeral 410 denotes output paths 
from the calculation unit 6 to the data bus 5 and from the data bus 5 to the data memory 4 

The operations of DSP will now be described. The basic operations of DSP is controlled based upon 
the program read from the program memory 1. by the control circuit 3. Furthermore, the data read from the 
data memory 4 is subjected to a series of processing operations such as the instruction fetch, the decoding, 
data reading, calculation, and calculation result writing on inputting the data into the calculation unit 6. 

When the same instruction is consecutively performed by way of the pipeline processing, one 
instruction may be approximately performed within one machine cycle. As a consequence, in case that a 
single instruction is repeatedly executed, the process speed may be increased more if the process is more 
consecutively executed. 

However, if a specific condition is satisfied with the calculation results, the following branch.ng process 
is required in the branching program. That is. in such a branching program, an intermediate check point is 
introduced in a routine, and the consecutive execution is once interrupted so as to judge a condition before 
the consecutive execution process is completed, and further a comparison is made between the calculation 
result data and the specific data Thus, based upon the comparison result, the branching process is 

executed * 
Fig 12 is a process low for performing an intermediate check while a series of consecutive execution is 
processed. The results of the calculation process is compared with a threshold value (steps i ST 5; I and 52)^ 
Thereafter, a decision is made whether or not an interrupt condition is satisfied (step ST 53). If YES then 
this process is completed. If NO, another decision is made whether or not the final data .s ^comphshed 
(step ST 54). If NO, then the process is returned to the previous step ST 51 m which the above-described 
ODeration is repeated. To the contrary, if YES. then this process is ended. 

'Ta motion compensating process of an image encoding method. • «^«2£Lr?l 
accumulation is employed for a pattern matching so as to detect a m.nimum pattern. When for instance, a 
value which is now accumulated exceeds over a minimum value, the remam.ng accumulation .s waste of 
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time. In such a case, the process is advanced to the next routine for the sake of efficiency. 

To this end, it is useful to perform the intermediate check to some extent. However, the various 
processes of comparisons and decisions, and also interruptions of the process accompany a loss of time. 
Further, according to the conventional DSP, it is possible to only judge the conditions on the positive or 
s negative decision of the data. When a comparison of a size is needed between the data and the specific 
threshold value, a subtraction is once carried out between the data in question and the threshold value, and 
thereafter, a decision can be performed based upon this subtraction result resulting in a lower processing 
efficiency of DSP. 

If there are a plurality of comparison threshold values, the processing efficiency is moreover lowered. 

70 For instance, in case that the process sorts are subdivided into plural numbers (n in number), the 
comparisons between the data in question and (n-1) threshold values, and the branching instructions based 
upon the comparison results are required. At least a loss of (n-1) x 2 machine cycles occurs. 

Since the fourth conventional digital signal processor is so constructed, the processing efficiency is 
lowered because of the following reasons. That is, in case that the branching process is carried out 

75 depending upon the calculation results or intermediate calculation results, the process is interrupted during 
the consecutive processing steps, and the subtractions and also the comparison processes are executed. 

Fig. 13 is a simplified schematic block diagram of a audio signal processor (DSSP1) which has been 
represented in Japanese Telecommunication Institute, symposium publication No. S10-1 in 1986. In the 
audio signal processor shown in Fig. 13, reference numeral 1 denotes an instruction memory into which 

20 instruction words have been stored; 3 represents an instruction execution control unit for controlling various 
operations such as decoding of the instruction word and calculations; and 2 indicates a program counter for 
holding an instruction address; 504 is an PC stack for preserving a return address used in the subroutine 
process and interruption process. This PC stack 504 preserves an instruction address 531 output from the 
program counter 2 just before the interruption process, until the process is accomplished. Reference 

25 numeral 505 indicates a sequence control unit for controlling the entire operation of the processor; 506 is a 
repeat control unit for performing a counting operation between the sequence control unit 505 and itself 
during the loop/repeat operation; 9 is a repeat counter for counting a repeat number during the execution of 
the repeat instruction; 508 is a program bus for transferring the decoded control data; 5 represents a data 
bus for transferring main data; 510 is a bus interface register for connecting the program bus and data bus 

30 5; 4 represents a data memory for storing calculation data; 6 indicates a calculation processing circuit for 
performing arithmetic operations such as addition, subtraction, multiplication, and division; 513 is an 
interruption control unit for starting the interrupting process; 514 is an external interrupt request signal; and, 
reference numeral 515 denotes an external interrupt acknowledgement signal. 

An operation of DSP1 will now be described. In general, a signal processor has a pipeline structure in 

35 order to increase a processing speed. For instance, in the signal processor as shown in Fig. 13 t the 
structure thereof is 3-stage pipeline. Accordingly, the following description is made based upon the pipeline 
processing. 

In a first stage of the pipeline, an instruction word 511 which is designated by an instruction address 
531 output from the program counter 2 is read from the instruction memory 1 and then input into the 
40 instruction execution control unit 3. 

In a second stage of the pipeline, both the control signal and data decoded by the instruction execution 
control unit 3 are transferred to the corresponding parts. 

In a third stage of the pipeline, various operations are controlled. That is, the calculation data 512 are 
read from the data memory 4 to the data bus 5 in response to the control signal, and written from the data 
45 bus 5 into the data memory 4, and furthermore processed in the calculation unit 6. 

The interruption control unit 513 has a 3-level interrupt function other than RESET. RESET not only 
resets the program counter 2. but also, initializes control registers such as a status register (SR), a flag 
register (FR), an interruption, and a bus control. 

An interrupt 0 (INTRO) is non-maskable, and the program counter 2 is set to an address "1" when an 
so INTRO signal is input 

An interrupt 1 (INTR1) is maskable, and is masked when RESET, INTRO, or INTR1 is accepted, or by 
being designated in the program. A release of masking is executed by the program. When this interruption 
is accepted, the program counter 2 is set to an address "2". 

An interruption 2 (INTR2) is maskable, and corresponds to a normal interruption having an acknowl- 
55 edgement function. 

When RESET, INTRO, 1NTR1, and INTR2 are accepted, or it is set by the program, INTR2 is masked. A 
release of masking is performed by the program. When an interruption request signal is accepted, an 
acknowledgement signal (INTR2) is output, and then an address "3" is set to the program counter 2. 
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An instruction word which will be executed after the normally executed instruction wore I. has been 
stored in an address which is defined by adding 1 to the instruction address 531 where the normally 
Aveeuted instruction word has been stored. . HHflH 
^fSCrtS. of the pipeline the instruction address 531 output from the program counter 2 .s added 
bv 1" in the adder so as to produce an address defined by adding the inst.ruct.on address 53 to 1 
bY In gene ? in the processor'naving a pipeline structure, a delay may be caused by this p.pelme unjMto 
instruction has been executed. As shown in Fig. 14. in a machine cycle of tune per.od Tn. the H/W mterrupt 
rpnuest sianal 514 is input into the interruption control unit 513. . ... , 

Tn respond to the above-described input, when the externa, interrupt acknowledgement s.gnal 5 5 ,s 
output from the i telpt control unit 513. an instruction word designated by an instructs address P«n ,s 
°Z out Since the interrupt signal has been received, the instruction word ^"^J^ " s 
address of the instruction execution control unit 3 at the mach.ne cycle of tame per.od (Tn + 1). .s 
invalidated and it is substituted by no operation instruction (nop). 

^JT^ Sinter 2 is set to an address "3" at the machine cycle of time per.od Tn. whereby an 
The program counter <^ _«s completely recovered from the .nterrupt.on 

ta SiT^eL normal instruction is performed. When the interruption .s executed dunng the repeat 

85/1 Vol. J68 B no. l, pages lo 7 y R 15 reference numeral 603 indicates a presently 

this diagram shows an entire search type method. In Fig. lt>. reierence ume . nresentlv 

nput Sock having a block size of It x l, used for compensate a mo^ of ap« ^ « tl^P^ 
input frame; and 604 indicates a motion vector search range for representing a range of (U + 2m) and I, 
Ttr!) ™* a block is located. This block is match! ng-processed with the presently .nput block 603 m the 

previously input frame. 

in this case, the number "M" of the search blocks is expressed by: 

" iZZ&XJL rang. of -m to *m pixels in the horiaont* direct and a range o, -n tt 

"r'^s^rr^'dCram « an image .needing transmission apparatus . genera, 

, pj„ tt , input block 603 of the input signal 60. SL *T»1« 

previously input signal 601: reference jwner* I Sgnafbetween the »*» 601 
compensation unit 602; 607 is a encodrng urat for encoding a dMranc. signa dm ^ 
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thereof will now be described with reference to an explanatory diagram of Fig. 17. 

The configuration shown in Fig. 16 has functions as follows: each of inter-block distortions between the 
presently input block "x" 603 with a size of 1, x l 2 at a specific position within the presently input frame 
and the respective blocks of M in number within the motion vector search range 604 of the previously input 
frame, is calculated; and a minimum value of these distortions, i.e., a relative position of a minimum 
distortion block V indicated by the minimum distortion, with respect to the position of the presently input 
block 603, is searched as a motion vector; so that a signal "ymin" of this block is output as a generated 
prediction signal 605. Then, in the frame inter-frame encoding transmission, the prediction signal can be 
produced even at the reception side by transmitting the motion vector information at the reception side. 

Assuming now that the number of the motion vectors "V" to be searched within a given motion vector 
search range 604 is n M" (an integer not less than 2). In case that a sum-of-absoiute-differences is 
employed as a distortion quantity between the previous frame block at the position of the specific motion 
vector "V n and the presently input block, an amount of distortion is calculated by: 

L 

Di " I I - XP| .... (7) 

P=l 

It should be noted that the input block is x = (x1. x2, — , xL), the biock to be searched is yi = (yh, yi2, 
— , yiL), and i = 1 to M, L is equal to ti x l 2 . Thus, the motion vector V is obtained by: 
V = Vi (min di | i = 1 - M) (8) 

Then, a calculation amount S1 of this case is obtained by the following equation when the sum-of- 
absolute-differences calculation needs "a" machine cycles and the comparison process needs "b" machine 
cycles. 

S1 = Lx M x a + M x b (9) 

In case that for instance, a = 1 machine cycle, b = 2 machine cycles, ti =8, i 2 =8, m = 8, and n=8 then 
L = 64, M=289, and; 

51 19.000 (10) 

machine cycles. This is very large value in view of the hardware arrangement. The high-speed calculation 
system such as the pipeline processing has been used in accordance with the frame cycle of the image 
signal. 

However, it is a great problem to lower quantities of the hardware. In accordance with Japanese KOKAI 
(Laid-open) patent application No. 63-181585, for instance, entitled: "AN APPARATUS FOR MOTION 
COMPENSATION INTER-FRAME ENCODING OF A TV SIGNAL", it has been proposed a method for 
calculating a tree search type motion compensation so as to reduce an amount of calculations. Fig. 18 is an 
illustration for explaining a metod of a motion compensation calculation. There are arranged first blocks " O 
n of low density at equal intervals to be searched within the motion vector search range 604. When a block 
n O " giving the minimum distortion is detected, second blocks " □ n to be searched are positioned within a 
"arrow region with this block B O " as a center thereof. In this narrow region, a block " □ " giving the 
minimum distortion is detected. Furthermore, third blocks * A " to be searched are set within another region 
with this block " o n as a center thereof so as to detect a block " A " giving the minimum distortion. Finally, 
the block " A " giving the minimum distortion within the motion vector search range 604 is specified. 

An amount of the calculations n S2 n in this case is expressed by: 

52 a (9xLxa + 9xb)x3 (11). 

As a result, under the same conditions as the above, it becomes 
S« 1,800 (12) 
machine cycles. 

Although a quantity of calculation according to this tree search type motion compensation calculating 
method becomes small, the capability to detect the minimum distortion block is lowered as compared with 
that of the full search type motion compensation calculating method. That is to say, there are considerable 
'possibilities that at the matching process of the first search operation with the low density, a selection is 
made on such a block of which position is apart from that of the correct block having the minimum 
distortion. As a consequence, there are many cases that the calculation result cannot reach the expected 
minimum distortion amount and gives a decision of no correlation, resulting in a lower efficiency. 

Since the conventional motion compensation calculating method has been so arranged in above, a 
calculation amount becomes great if the fall searching operation with high reliability in the motion 
compensation calculating is employed, so that a large scaled arrangement of the hardware is required. On 
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the other hand, if the calculation amount is reduced by way of the tree searching ^ e ^ od ' ^^f^^ 
of the minimum distortion block is lowered. As a consequence, there are problems of the enrorneous 
detections and insufficient efficiency. 



SUMMARY OF THE INVENTION 

The present invention has been made in an attempt to solve the above-described prob lems and 
therefore has an object to provide a digital signal processor in which the number of the d.stort.on 
calculations is reduced and simultaneously an amount of calculation is reduced by outputtmg a m.mrnum 
distortion and a block number of a minimum distortion block, so that the processmg t.me can be eftaently 

^^fachieve the above-described object, a digital signal processor according to the present invention 
comprises: 

a minimum distortion register for holding a minimum distortion; ^ tnrt ; nn . 

a minimum distortion position register for holding a number of a b.ock havng sa,d m,n,mum d. 5 tort.on. 
a block counter for holding a number of a block performing a present distorton calculation; 
a XarX for comparing an accumulator output with a value of said minimum distortion reg.ster at every 
cyces ^ i order* detect the minimum distortion among -M- blocks in , number -<M being a posU-ve 
integer) of data train, the distortion calculation is performed with respect to a k-th block (1* kSM. k be,ng 
an integer) of the "M" blocks in number; and, «^ M «i B K i^u 

an inst motion execution controlling unit capab.e of holding the minimum distortion ^ 
among "M" blocks in response to a predetermined instruction word from ar • «*urton W'"^ 
manner that an accumulation is interrupted during the accumulating operation when the otfput ^from the 
accumulator exceeds over the value of said minimum distortion register, the process is a 
subsequent instruction or an instruction of a designated address, and when the accumulate ,s correctly 
ended the value of said accumulating register is written into the minimum distortion register. 

In he digital signal processor according to the invention, during the accumulating operation a 
comparison is made between the accumulated data and minimum distortion at every cycles When the 
omp 'i- resuK exceeds over the minimum distortion, the accumulation is ^£ ^ 
update of the minimum distortion and update of the block number are performed for the bio 
accumulation has been normally accompHshed. As a result, a required calculate amount is reduced and 

^ ttXXZ^^^ is to provide a high-speed digit, signal processor having a 

^^ZS£SXt a digita, signa, processor according to the present invent 

auction execution control unit for controHing operations such as decoding and emulating of an 

instruction word which is read from an instruction memory in a predetermined order. 

Tc^on l for performing various calculations on two input data which have been transferred from a 

an fntema. data memory for storing a calculation resuU which has been 

an external data memory connecting unit for reading data from an external data memory to sa d data ous 
Tnd rwr ting the dam on said date output bus into said external data memory, by us, ng value s ; output 
Tom an address generating unit which generates one output address v,ue and two .nput address values m 

^"j££S£?£» connecting one port of said interna, data memory to said externa, data 

TZ«?r^^er contro, unit for inputting and outputting the data in units of b,ock between 
a el n ! Zly connecting unit and said interna, data menory *~^Jf»* ^ 
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large region at the low-speed memory can be accessed in DMA. 

A further object of the present invention is to provide digital signal processor in which a required 
calculation amount can be reduced to 1/2 and less in a case that the data precision is enough of a half and 
less of a data size at its maximum, so that the calculation capabilities can be increased and higher speed 
5 calculation can be realized. 

To achieve the above object, a digital signal processor is characterized in that when the required data 
precision is smaller than, or equal to a half of the data size at maximum thereof, the input data is at first 
multiplied in parallel by the multiplier circuit, and then, the resultant data is shifted as to execute the 
arithmetic operation. By this arrangement of the multiplier circuit, the calculation speed can be increased. 
w Then, in the multiplier circuit of the digital signal processor according to the invention, the data of the half 
upper bit side of the input data and also the half lower bit side thereof are regarded as independent data, 
these multiplications are parallel-processed in four channels, the shift process, or zero set process with 
respect to the respective resultant data is performed, and thereafter the addition or subtraction on the 
resultant data is executed, so that the calculation on the plural channels can be executed by the same 
is hardware at a speed two times higher than the norma!. 

A still another object of the invention is to provide a digital signal processor capable of performing a 
comparison process without interrupting a continuous process even while a series of continuous processing 
operation is executed, whereby a branch processing operation can be realized at a better efficiency. 

To achieve the above-described object, a digital signal processor according to the invention comprises: 
20 a control circuit including a program counter for address-controlling a fetched instruction; 
a data memory for inputting/outputting data; 
and, 

a data decision unit for selecting one of an output from an arithmetic calculator within a calculating unit, an 
output from a logical shifter, and an output from a multiplier in parallel with an operation of the calculating 

25 unit; for simultaneously comparing the selected output data with threshold values of "n w in number (n being 
an integer not less than 1); for judging in which region said output data is present among data regions that 
are subdivided into (n + 1) regions by said threshold values of "n" in number based upon comparison 
results of "n" in number; for sequentially comparing said comparison result with region limiting conditions of 
"m" in number (m being an integer less than 1) for designating a preset data region and for outputting 

30 branch address information corresponding to a consistent region limiting condition among preset branch 
addresses of "m" in number corresponding to said region limiting conditions of "m" in number in case of 
one of said conditions is consistent, or for outputting a signal which indicates discrepancy in all of said 
conditions in case of ail of said conditions of "m" in number are discrepant. 

In accordance with the data decision unit of the present invention, the parallel-comparison processing is 

35 performed between a plurality of threshold values and the outputs from the multiplier unit per machine 
cycle, and also a specific branch destination is selected from a plurality of branch destinations in 
accordance with the comparison results, so that without interupting the continuous process, the continuous 
comparison decision can be performed. As a result a complex branch processing operation can be 
controlled at the higher efficiency. 

40 It is another object of the invention to provide a digital signal processor in which lowering the process 
speed and increasing the step number of instructions are suppressed, and perfect returning from a 
interuption is secured by restoring the respective register values which have been preserved at the start of 
the interruption. 

To achieve the above-described object, a digital signal processor according to the present invention 
45 comprises: 

a plurality of register preserving memories for preserving each of register data when the interruption is 
performed; 

an interruption controlling unit for correctly transferring data to each of said registers at returning from the 
interrupting operation, and for controlling the complete recovery from the interrupting operation by restarting 
so executions by remaining repeat numbers even after returning from the interruption which has occurred on 
,the way to repeat processing; and, 

an interruption enable controlling unit for forming an interruption inhibiting period to inhibit a H/W 
interruption other than the interrupting process. 

In the register preserving memories according to the invention, when the interruption is carried out, the 
55 register values of the respective registers are written after the previously executed instruction is accom- 
plished. In the interruption controlling unit, the register values which have been written into said register 
preserving memories are restored to the respective registers at the end of the interrupting operation, and 
the repeat instruction can be executed by the remaining repeat numbers after returning from the interruption 
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which has occurred even during the repeat instruction execution. Further, the enable control unit can 
improve the data processing capabilities of the digital signal processor by employing the interruption 
inhibiting period during which the external interruption is inhibited in the course of waiting the memory 
subjected to the external data memory access, and in the course of decoding and executing a branch 

5 instruction, a return instruction, and a software interrupt instruction. 

it is further an object of the invention to obtain a motion compensation calculating method by which a 
calculation amount can be lowered without degrading the detecting performance of the minimum distortion 
block, and a simple and compact hard-ware can be realized. 

To achieve the above-described object, the motion compensation calculating method according to the 

to invention comprises the following steps so as to subdivide a presently input frane of digital image data 
constructed of a plurality of frames which have sequentially input in the time sequence, into a plurality of 
blocks, and to detect a motion vector and a block for giving the minimum distortion by calculating an inter- 
pattern analogy between each of the blocks of the image data in the presently input frames and respective 
blocks of a previously input frame, said steps of: 

75 for setting as a search small-region, a first motion vector search range having a predetermined size and 
having, as a center thereof, a position of an input data block to be encoded which is a motion vector search 
range in the previous frame data; , 

for equally subdividing this first search range into a plurality of regions to obtain motion vectors to be 
calculated; 

20 for allocating first search motion vector groups of "n" in number (n being an integer not less than 1) to the 
. respective regions at a low density; 
for calculating a distortion of each of the motion vectors, which represents a pattern similarity degree 
between the block data of the position indicated by this motion vector and the input data block functioning 
as a presently input block, and for summing results corresponding to the motion vectors of "n" in number 

25 to obtain the distortion amount within the region; 

for detecting a region where the distortion amount becomes minimum within the first search region; 

for setting as a minimum distortion region, a region where a distortion amount within this region becomes 

minimum; 

for setting as a limited search range, a second motion vector search range having a size smaller than that 
30 of the first search range with respect to the minimum distortion range as a center thereof; 

for allocating second search motion vector groups at a higher density within the second search range; and 
for detecting a block which is most similar to the input data block based upon a minimum distortion amount 
with respect to the second motion vector groups, whereby both the block providing this minimum distortion 
and the motion vector thereof are a final prediction signal and a motion vector. 

In accordance with the motion compensation calculating method of the present invention, the motion 
vector search range is subdivided into a plurality of search small-regions, a plurality of blocks to be 
searched are allocated at the low density to every regions, the region where a sum of the distortion 
amounts between the blocks becomes minimum with respect to the motion vectors to be calculated, is 
detected as a minimum distortion region. Furthermore, with respect to this minimum distortion region, the 
limited search range is set as the high density blocks to be searched, from which the motion vector is 
detected At first, a search operation of a position expected to exist a minimum distortion block can be 
estimated at high precision by comparing the distortion amount in units of region, and thereafter, the high- 
density motion vector search operation is carried out within the region so as to maintain the higher 
detecting precision under suppressing the calculation amount. 



35 



40 



45 



BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a schematic block diagram of a first conventional digital signal processor; 
so Fig. 2 illustrates a relationship of data blocks; 

Fig. 3 is a flowchart for explaining a detecting operation of a minimum distortion effected in the 

conventional processor, shown in Fig. 1; 

Fig. 4 illustrates an amount of distortion calculations performed in the conventional processor; 
Fig 5 is a schematic block diagram of a second conventional digital signal processor; 
55 Fig. 6 is an access timing chart of an external data memory employed in the second conventional 

digital signal P r °^° r, djagram Qf a DSSp chjp em p, oyed in a third conventional digital signal processor; 
Fig. 8 is a flowchart of a conventional sum-of-product process; 
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Fig. 9 is a flowchart of the conventional complex number multiplication process; 

Fig. 10 is a flowchart of the conventional binary tree search vector quantizing process; 

Fig. 11 is a block diagram of a fourth conventional digital signal processor; 

Fig. 12 is a flowchart of the continuous calculating process containing the data decision in the fourth 
conventional digital signal processor; 

Fig. 13 is a block diagram of a fifth conventional digital signal processor; 

Fig. 14 is a timing chart for explaining the normal interrupting operation effected by the fifth 
conventional digital signal processor; 

Figs. 15 and 17 are explanatory diagrams for a conventional tree search type movement compensa- 
tion calculation method; 

Fig. 16 is a schematic block diagram of an image encoding transmission apparatus where a normal 
interframe encoding process has been performed; 

Fig. 18 is a diagram for explaining a conventional tree search type motion compensation calculating 
method; 

Rg. 19 is a block diagram of a digital signal processor according to a first preferred embodiment of 
the present invention; 

Fig. 20 is a flowchart for representing an operation of a minimum distortion detection effected in the 
first embodiment; 

Fig. 21 is a diagram for representing a distortion calculating amount according to the invention; 
Fig. 22 is a schematic block diagram for showing a digital signal processor according to a second 
preferred embodiment of the invention; 

Fig. 23 is a block diagram for representing an arrangement of a direct data transfer controlling unit 
shown in Fig. 22. 

Fig. 24 is a diagram for showing DMA transfer regions in an internal data memory and an external 
data memory; 

Fig. 25 is a diagram for representing a register arrangement example for setting external data 
memory access methods of a programmed transfer and a DMA transfer; 

Fig. 26 is a timing chart in case that the external data memory is accessed by the programmed and 
DMA transfer; 

Fig. 27 is a timing chart of the external data memory access in an external data memory connecting 
unit shown in Fig. 22; 

Fig. 28 is a detailed circuit diagram of a multiplier circuit of a digital signal processor according to a 
third preferred embodiment of the invention; 

Figs. 29a, 29b are state diagrams of shifters and others for representing operation contents of a 
double precision multiplication and a single precision parallel multiplication; 

Fig. 30 is a state diagram of shifters and others for representing operation contents of an n-bit data 
parallel sum-of-product calculation; 

Fig. 31 is a flowchart of representing the calculation flow in Rg. 30; 

Fig. 32 is a state diagram of shifters and others for illustrating operation contents of a single precision 
complex number calculation; 

Fig. 33 is a flowchart for explaining the calculation flow in Fig. 32; 

Fig. 34 is a state diagram of shifters and others for representing operation contents of the binary tree 
search vector quantizing calculation; 

Fig. 35 is a flowchart for showing the calculation flow in Fig. 34; 

Fig. 36 is a diagram for explaining a data multiplexing format in the data memory; 

Fig. 37 is a schematic block diagram of a digital signal processor as a whole according to a fourth 
preferred embodiment of the invention; 

Fig. 38 is a block diagram of an internal arrangement of a data decision unit; 

Fig. 39 is a block diagram for showing an internal arrangement of a condition decision unit; 

Rg. 40 is a diagram for explaining one example of data region decision; 
„ Rg. 41 is a diagram for explaining conditional data representative of a branch condition; 

Rg. 42 is a flowchart of the continuous calculation process containing the data decision; 

Rg. 43 is a block diagram of a digital signal processor according to a fifth preferred embodiment of 
the invention 

Rg. 44 is a timing chart for explaining the normal interruption operation of the present invention; 
Rg. 45 is a timing chart for explaining the interruption operation during the repeat instruction 
execution of the invention; 

Fig. 46 is a diagram for explaining a motion compensation calculating method according to a 
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preferred embodiment of the invention; and, 

Fig. 47 is a flowchart for explaining the motion vector detecting process. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

A description will be made on a first preferred embodiment of the present invention with reference to 

draV Fiq 9S 19 is a schematic block diagram of a digital signal processor according to the invention. It should 
be noted that same reference numerals may be employed for denoting the same or similar components 
shown in Fig. 1 and no further explanation thereof is made. ' . . < n 

in Fiq 19 reference numeral 110 is a minimum distortion register for hold,ng a m.n.mum d.stortion; 111 
is a comparator for comparing a value of the minimum distortion register 1 10 with an output of accumulator 
7 and for outputting a comparison result to an instruction execution controlling unit 3; reference nurnera 112 
is a block counter for representing a block number which now performs accumulation; and 113 .nd.cates a 
minimum distortion position register for holding a block number having the minimum *^«- 

Furthermore, reference numeral 101 indicates an input/output pass b ^^^Z^^ °ll 
counter 112; 102 is an output path from the minimum distortion position register 113 to the data bus 5, 103 
is an output path for supplying an increment control signal from the instruction execute controlling unrt 3 
to the bS counter 112; 104 is an output path for announcing the comparison result of comparator^ 1 1 to 
the instruction execution controlling unit 3; 105 represents an output path for supply.ng the out PL rt data of 
the accumulator 7 to the comparator 111, 106 represents an output path for ■^'*J* d **£ 
minimum distortion register 110 to the comparator 111; 10 represents an update pa h from ^mu^t- 
ing register 8 to the minimum distortion register 110; and. 108 indicates an updata path from the block 
counter 1 12 to the minimum distortion position register 113. .. 4 r ^ . 

Fig 20 is a flowchart for explaining an operation to obtain a block number and a distortion correspond 
irg to a minimum distortion among blocks of "VP in number by employing the d.g.tal signal processor 

Sh T L^nse'to an address output from t* program counter 2. an instruction ^ ^ from*, 
instruction memory 1 and input into the instruction execution controlling unit 3 via an output path 52. Based 
Ta decoded instruction, the instruction execution controlling unit 3 sends a control signal to the venous 

Clr t — ds to the instruction of the minimum distortion detection 

which is accompanied by accumulations such as the difference absolute ^^^^2 
products, the data transfer of the read data from the data memory 4 to the data , bu 5 fte data transfer o 
at the most two pieces of output data from the data bus 5 to the calculator 6 and I the .data transfer of 
accumuTaton result from the accumulator 7 by using the output data of the calculator 6 and the output data 

° f ^S^OT-M which is supplied via the output path 105 branched from the 

output pam 62 of the accumulator 7 is compared with the output data which is supplie MJorr j the minimum 
distortion reoister 110 via the output path 106. by the comparator 1 1 1 every cycle (step ST 102). 
''Te im^son result ob JJ by the comparator 111 is 

controlling unit 3 every cycle. When the accumulation result of the accumulator 7 ,s greate than the value 
o the minimum distortion register 110. namely if YES, then the 1™^"%^ 
repeat counter 9 to "0" and simultaneously to increment the value of the block counter 1 2 in response to 
Cerement control signal derived from the instruction execution control.ing unrt 3. and then, the process 
k advanced to the next step (steps ST 103) and 104). 

vZ the accumulation operation is carried out by the number set in •» «P*^ £ 
accumulation- is normally accomplished, the value of the accumulating register 8 is written and ^ated into 
™rnMm^ distortion agister 110 (step ST 105); the value of the block counter 112 is written and updated 
In ,hT^ register 113 (step ST 106). and the block counter 112 is incremented by 

the increment control signal 103 (step ST 107). detected 
" When the minimum distortion block with respect to a block "A" of a certain data series is detec eo 
' am nrT"M» oie^es of blocks "y" to be searched in accordance with the above-desenbed processing 
ZSLfJSZ tTumbVof the accumulations for a K-th b.ock is "W 0* is an integer. 1 s W k S 
i w), the sum-of-products process is performed by 
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and both the minimum distortion and the block number of the minimum distortion are obtained simulta- 
neously with the accumulation. As a result, neither comparison nor update processing is required to 
obtained this minimum distortion and the minimum distortion block number. As shown in Fig. 21, the 
calculation processing time is shortened only to t x ( Z W k ). 

It should be noted that although the difference square summation has been employed as the distortion 
calculation according to the above preferred embodiment, either difference absolute values or inner 
products may be utilized. 

Also, the above-described criterion for the comparator is "whether or not the accumulated output from 
the accumulator exceeds over the value of the minimum distortion register", however, another criterion may 
be made to be "whether the accumulated output from the accumulator exceeds over, or is equal to the 
value of the minimum distortion rr ^ster". 

A description of a second pre, rred embodiment of the invention will now be made. 

Fig. 22 is a schematic block ; agram of a digital signal processor according to the present invention. It 
should be noted that the same reference numerals will be employed for denoting the same or similar circuit 
elements shown in Fig. 5 and no further explanation thereof will be made. 

In the processor shown in Fig. 22, reference numeral 83 indicates a mode register for setting the 
access method of the external data memory; 34 indicates an output bus for outputting the calculation result; 
and 85 is a direct data transfer bus. 

Furthermore, reference numeral 21 1 is an input/output path of the data from the data bus 5 to the mode 
register 83; 212 is an output path of the control signal from the mode register 83 to the external data 
memory connecting unit 10; 263 indicates an input/output path of the data from the direct data memory 
transfer bus 85 to the data memory 4; 264 indicates an input/output path of the data between the direct data 
memory transfer bus 85 and external data memory connecting unit 10; and reference numeral 265 indicates 
an output path of the data from the data output bus 84 to the external data memory connecting unit 10. 

Fig. 23 is a schematic block diagram of an arrangement of DM AC 82 employed in Fig. 22. In Fig. 23, 
reference numeral 231 indicates a frame horizontal size register (dmfhr) for representing a horizontal size of 
a two-dimensional address space (domain); 232 denotes a block horizontal size register (dmbhr) for 
representing a horizontl size of a rectangular portion within the two-dimensional address space; 233 denotes 
a block start address register (dmbsr) for indicating a head address of the external data memory to execute 
a DMA transfer; 234 represents an internal memory start address register (dmssr) for indicating a head 
address of the internal data memory to execute the DMA transfer; 235 represents a word register (dmwcr) 
to indicate the number of words of the DMA transfer; 236 indicates a DMAC register (dmcr) for selecting an 
external address output mode at the DMA transfer, and the external memory; 237 is a DMA address 
calculation unit; and reference numeral 238 is a DMA transfer controlling unit to control the DMA transfer. 

Furthermore, reference numeral 271 is an input/output path of the frame horizontal size register 231; 
272 is an input/output path of the block horizontal size reigster 232; 273 indicates an input/output path of 
the block start address register 233; 274 is an input/output path of the internal memory start address 
register 234; 275 is an input/output path of the word register 235; and reference numeral 276 is an 
input/output path of the DMAC register 236. 

In addition, reference numeral 277 indicates an output path for the internal data memory address of the 
DMA transfer from the DMA address calculation unit 237; 278 is an output path for the external data 
memory address of the DMA transfer from the DMA address calculation unit 237; and 279 denotes an 
output path for outputting the control signal such as a DMA transfer word number from the DMA transfer 
controlling unit 238 to the DMA address calculating unit 237. 

Fig. 24 illustrates an example of a transfer region of the DMA transfer performed between the internal 
data memory 4 and external data memory 241 by DMAC 82 shown in Fig. 23. 

Fig. 25 is a diagram for illustrating bit arrangements of the DMCA register 236 shown in Fig, 23 and the 
mode register 83 shown in Fig. 22. In Fig. 25, symbol "A" denotes preliminary bits, symbol "B" indicates a 
first bit of an address output mode, and symbol "C n indicates a zeroth bit of a memory connection mode. 

Fig. 26 illustrates a timing example where the external data memory is accessed by the programs and 
DMA's. 

An operation of the digital signal processor will now be described. The instruction word read from the 
instruction memory 1 is input to the instruction execution controlling unit 3 via the input/output path 201 . In 



EP 0 373 291 A2 



response to the control signal decoded by this instruction execution controlling unit 3. the calculation data 
from the internal data memory 4 to the data bus 5 is read via the output path 203, whereas the data from 
the data bus 5 is input to the calculation unit 6 via the output path 204. The calculation processing result at 
the calculating unit 6 is output to the data output bus 84 via the output path 205, the data from the data 
output path 206 is written to the internal data memory 4, and also the data from the data output bus 84 is 
written into the external data memory connecting unit 10 via the output path 265. 

Both the address of the input data which has been input from the internal data memory 4 via the output 
path 204 and a write destination address of the internal data memory 4 of the output data wh.ch has been 
output from the calculation unit 6 via the output path 205 to the data output bus 84, are controlled by the 
address generating unit 8 having three-line address generators. 

The address generating unit 8 generates addresses by using readable/writable data which has supplied 
from the data bus 5 via the input/output path 210, and controls of the internal data memory 4 and external 
data memory connecting unit 10 are performed by using the data output via the output path 208 and 209, 
respectively, so as to determine the write destinations of the input data and output data to the calculating 

Unit The access mode of the external data memory 241 by means of the external data memory connecting 
unit 10 is determined by a value which has been set via the data bus 5 into the mode register 83 in 
accordance with the instruction word read from the instruction memory 1. 

When on the other hand, the data is set into the specific register of DMAC 82 via the data bus 5 based 
upon the 'above-described instruction word, the DMA transfer is initialized. The external data memory 
connecting unit 10 is controlled by DMAC 82 independently to carry out the data transfer between the 
internal data memory 4 and external data memory 241 via the input/output paths 263 and 264. and direct 
data transfer bus 85. 

The DMA transfer controlling unit 238 performs an initialization of the DMA transfer by means of the 
data which has been set in the DMA address calculating unit 237 via the data bus 5. The DMA address 
calculating unit 237 generates a two-dimensional block address 278 with respect to the address of the 
external data memory 241. and also an ascending one-dimensional address 277 with respect to the internal 
data memory 4 based upon the values of the frame horizontal size register 231, block horizontal size 
register 232, block start address register 233 and internal memory start address register 234 

In the DMA transfer controlling unit 238, when the DMA transfer word number which has been set in the 
word register 235 is ended, a termination signal is sent to the DMA address calculating unit 237. 

As shown in Fig. 24, the above-described DMA transfer can be performed between arbitranly 
rectangular regions (k-line x 1-column in Fig. 24) of the external data memory 241 from the i arbitrary 
address (address "t" in Fig. 24) and of the internal data memory 4 from the arbitrary address (address S 

F As shown in Fig. 25. when both the zeroth bits of the mode register 83 and DMAC register 236. which 
indicate the menory connecting mode, are "0". it is a waiting mode waiting until the read/write completion 
signal from the external device is detected during the use of the low^speed memory. To the. contrary/ when 
the zeroth bit indicating the memory connecting mode is "1". it is such a mode that after he lower bits of 
the address are output, the read and write operations are accomplished in one machine cycle. 

When the first bit. which indicates the address output mode, is "0". both the upper and lower bits of the 
address are output in two machine cycles, whereas when this bit is "1", only the lower bits of the address 
are output in one machine cycle. 

By independently setting the mode register 83 and DMAC register 236, the external memory access 
from the program and DMA can be independently carried out m 

In Fig 26. there is shown an access timing example of the external data memory 241 in case that 1 
are set as the address output node and as the memory connecting mode in the mode register 83 shown m 
Fig. 25. and "0" are set as the address output mode and as the memory connecting mode in DMAC 

re9 ' S The 2 access to the external data memory 241 from DMAC 82 is accomplished by detecting the 
read/write completion signal from the external device in case of the low speed memory (n machine cycles 
in Fig. 26), whereas the external data memory access from the program is completed m 1 machine cycle in 

bV DMA is continuously performed unless the externa, data memory 
access is effected by the program. Then, when the external data memory access is executed by the 
!^?1n7t«»*t operation by DMAC 82 is interrupted and after the access operate by the program is 
accomplished, the process is ^arted^ ^ ^ ^ ^ ^ ^ ^ ^ ^ m 
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connecting unit 10 shown in Fig. 22. It should be noted that same reference numerals will be employed for 
denoting the same or similar circuit elements shown Rg. 6, and no further explanation thereof is made. 

In Rg. 27, reference numeral 251 is a signal for controlling upper address timings when the address is 
output (referred to as an n AHE n ); 252 is a signal for controlling lower address timings when the address is 
output (referred to as an "ALE"); 253 indicates a signal for announcing to the external device whether or not 
the external data memory access is executed by the processor or DMA (referred to as a "P/D"); and 
reference numeral 254 is a read/write completion signal from the external device (referred to as a 
"DTACK"). 

When the high speed memory shown in Rg. 27(a) is used, AHE 251 is asserted in the first machine 
cycle and also the upper address is output from an external address terminal 291 of the address bus 78, 
both ALE 252 and RE 292 are asserted in the second machine cycle, and the data from an external data 
terminal 293 of an external data memory 241 is fetched at the trailing edge of the second machine cycle. 

When the low speed memory as shown in Rg. 27(b) is used, AHE 251 is asserted in the first machine 
cycle and also the upper address is output from the external address terminal 291 of the address bus 78, 
both ALE £52 and RE 292 are asserted in the second machine cycle, RE 292 is negated and the data from 
the external data terminal 292 of the external data memory 241 is fetched at the trailing edge of a cycle 
where the external device asserts DTACK 254. Furthermore, ALE 252 is negated at the trailing edge of the 
cycle where DTACK 254 negated. 

As above-described, the external data memory connecting unit 10 has the following features. 

(a) . The connecting unit 1 0 includes two address output modes to the external data memory. In one 
address output mode, both the upper and lower addresses are output in two machine cycles, so that ail of 
external data memory regions can be accessed. In the other address output mode, the lower address is 
output only in one machine cycle, so that the specific region of the external data memory 241 can be 
accessed at a high speed. These two modes are changed by the value of the mode register set by an 
instruction. 

(b) . It is possible to connect two types of external data memory 241 . One is the high speed memory 
where after the lower address is output, the read/write operation is accomplished in one machine cycle. The 
other is the low speed memory where it is waited until the read/write completion signal from the external 
device is detected. These two types are changed by the value of the above-described mode register. 

The direct data memory transfer unit has the following features. 

(c) . In accordance with the direct memory control register set by an instruction, the above-described 
two address output modes and two types of external data memory connections are available independent 
from the external data memory access by an internal instruction based upon the value of the mode register. 

(d) . The address designation with respect to the external data memory connecting unit is so arranged 
in a manner that the rectangular portion of Mines by l-columns (k, I are integers) in the two-dimensional 
address space of m-lines by n-columns (m, n are positive integers) are sequentially designated. The 
address with respect to the internal data memory is designated from an arbitrary starting address in an 
ascending order, and the two-dimensional data transfer is performed between the external data memory and 
internal data memory. Further, when the data transfer is commenced, the transfer direction and transfer data 
number are designated by an instruction, so that the data input/output and internal calculation process with 
the external data memory are executed in parallel in units of rectangular block of Mines by l-columns. 

It should be noted that in the above-described preferred embodiment, a description was made that the 
number of the external address terminals was 16 bits, however other terminal numbers may be utilized. 

It should also be noted that since there is no relationship between the essential points of the invention 
and detailed specifications of the above-described preferred embodiments/the contents of the invention are 
not restricted thereto. 

A third preferred embodiment of the invention will now be described with reference to the drawings. Rg. 
28 is a concrete arrangement of a multiplier circuit 303 according to the third preferred embodiment of the 
invention. In principle, the circuit arrangement of DSP according to the invention is the same as that of the 
conventional one described in Fig. 7. However, the arrangement of the multiplier circuit 303 is mainly 
different. 

In Rg. 28, reference numeral 320 indicates a register A as a first 2n-bit sized register, for inputting data 
X among two pieces of data X and Y which are simultaneously read out from the data memory 4; 321 is a 
register B as a second 2n-bit sized register, for inputting the data Y; reference numerals 322 and 323 
represent upper n-bits of the data X (referred to as "data A1 n ) set in the register A and lower n-bits thereof 
(referred to as "data AO") respectively; 324 and 325 denote upper n-bits (referred to as "data B1 ") of the 
data Y set in the register B, and lower n- bits thereof (referred to as "data B0 n ); 326. 327, 328 and 329 
represent a first multiplier (referred to as an "MPY1 "), a second multiplier (referred to as an tt MPY2 n ) t a 
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third multiplier (referred to as an "MPY3") and a fourth multiplier (referred to as an MPY4 ) for multiplying 
he data A1 and B1; the data AO and B1; the data A1 and BO; and the data AO and BO ,n paraNel. 
respectively; reference numerals 330, 331, 332 and 333 represent a first shifter 
r)Tsecond shifter (referred to as a "shifter 2"), a third shifter (referred to as a "stutter 3 ) and a fourth 
sh fter (referred to as a "shifter 4") for performing a shift process or zero set in accordance w.th a 
S^Zln he rogram memory 1 w'h respect to the outputs from MPY1. MPY2, MPY3 and MPY4 
respectively 334, 335. 336 and 337 are output data from the first to fourth shifters 30 to 33, respectively. 
338and 339 denote a first arithmetic calculator (referred to as an "AU1 ") and a second arithmetic calculator 
(referred to as an "AU2") for inputting therein the outputs from the shifters 1 and 4 or the stutters 2 and 3 
respectively, and for summing or subtracting these outputs in accordance with the m.croprogram; and 
reference numeral 340 indicates a third arithmetic calculator for inputting therein the outputs from AU1 and 
AU2 and for summing or subtracting these outputs in accordance with the microprogram so as to output the 
final calculation resultant data of 4n-bits to the calculating unit 6. 

An operation will now be described. The data input/output in the data memory 4, and vanous calculation 
processes at the multiplier circuit 303 and calculating circuit 6 as shown in Fig. 7 in detai . are executed .n 
such a manner that the control circuit 3 reads the microprogram in the program memory 1 . ^"""M™ 
thereof are decoded, and the pipeline process is carried out in response to the control signal based on the 
decoded instructions. Where, the data size is 2n-bits at a maximum size, the resultant n-b.t data w,l. be 
referred to as single precision data, and 2n-bit data will be referred to as double precs.on data. 

The multiplication system instructions based upon the microprogram include vanous .nstructions, such 
as a double precision multiplication (2n-bits x 2n-bits) for multiplying 2n-bit data with each other, a smgle 
precision multiplication (n-bits x n-bits) for multiplying n-bit data with each other, a single preas.on sum-of- 
products, a single precision complex number multiplication, and a binary tree vector quantizing mult.pl.ca- 

ti0 " However, in this case, in the multiplier circuit 303 shown in Fig. 28. each part thereof will be operated in 
response to the control signal corresponding to the sorts of the above instructions as follows That ,s two 
pieces of data simultaneously read from the data memory 4 are supplied to 

selectors 301 and 302. the data X is set into the register A and the data Y ,s set into the reg.ster B. It 
should be noted that both the data X and Y are 2-bit sized data at the ™™™ m ™ l »* 

The upper n-bit data A1 of the data X and lower n-bit data AO thereof wh.ch have been set mto the 
register A are supplied to MPY 1 . MPY 3 or MPY 2. MPY 4 respectively Also, the upper n-brf data B1 and 
lower n-bit data B0 of the data Y which have been set into the register B are supplied to MPY 1 MPY 2 or 
MPY 3 MPY 4 respectively. As a consequence. MPY 1 multiplies the data A1 by B1. MPY 2 m ult.pl.es he 
data AO by B1, MPY 3 multiplies the data A1 by B0. and MPY 4 multiplies the data AO bv 'B Mn para el. 
and the respective 2n-bit sized resultant data are supplied to the shifter 1. shrfter 2. sh.fter 3. and sh fter * 
Ts to the resultant data input into the respective shifter 1 to shifter 4. the shift process or zero set process 
Ts carried otf in accordance wKh sorts of the instruction. Thus, the output data 334 to 337 of 4n-b,ts der.ved 
from the respective shifters 1 to 4 are input into AU1 and AU2. 

AU1 performs the summation or subtraction on the data from the shifters 1 and 4. and the resultant data 
is supplied to AU3. AU2 performs the summation or subtraction on the data from the K s ; ,ft ^ S n 2 n ^f h ^. a ^ 
suppUes the resultant data to AU3 AU3 furthermore executes the summation or subtraction on the te 
Sd from AU1, AU2, and thereafter sends the resultant data as the 4n-bit fmal calculation resultant data 

t0 ' A dt C cr'iS 5 a required amount of calculation on the various calcu.ating modes will now be made. 



(1). A double precision multiplication. 



Fig 29(a) represents a diagram for showing operation contents of the shifters 1 to 4 and AU1 to AU3 in 
this case That is in the shflter 4. the shift value 0 is processed, and the n-bit left shift process is performed 
? he ^2 ad 3. further 2n-bit left shift process is performed in the shifter 1. In AU2, »e . summatjo 
is carr ed out, the summation is executed in AU1, and the summation .s performed ,n AU 3, whereby he 
double precision multiplication is performed. In this case, a required amount of ca.culat.on .s 1 mach.ne 
cycle per 1 data, which is the same as that of the conventional apparatus. 



(2). A single precision parallel multiplication. 
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In Rg. 29(b), there are shown the operation contents of the shifters 1 to 4 and AIM to AU3 in this case. 
In this case, it should be noted taht as the single precision data, two pieces of data have been previously 
stored in the data memory 4 having 2n-brt data sizes by way of the multiplex as shown in Rg. 36. Then, 
both the multiplication result (A1 x B1) on the upper n-bit input data, and multiplication result (AO x BO) on 

5 the lower n-bit input data are obtained with the respective MPY 1 and MPY 4. Thereafter, the shift value 0 is 
processed in the shifter 4, the 0-set is performed in the shifters 3 and 2, and the 2n-bit shift is carried out in 
the shifter 1. Then, additions on the data are performed in AU1, AU2, and AU3, so that the single precision 
multiplication results are multiplexed into a resultant 4n-bit data as an upper 2n-bit and lower 2n-bit data. In 
this case, the required calculation amount becomes 0.5 machine cycles per 1 data, which is at a speed of 

70 two times higher than that of the conventional apparatus. 



(3). A single precision parallel sum-of-product calculation. 

75 Rg. 30 illustrates the operation contents of this calculation. Rg. 31 shows an calculation flow. Also, in 
this case, the single precision data has been multiplexed as shown in Rg. 36. After the initialization is set at 
the step ST 331; in the parallel sum-of-product calculation process of step ST 332, the following processes 
are executed. That is, the shift value 0 is processed for the multiplication result (AO x BO) of the lower n-bit 
data of two pieces of input data in the shifter 4; w 0" set is performed in the shifters 3 and 2; the shift 

20 process of the shift value "0" are performed with respect to the multiplication result (A1 x B1) of the upper 
n-bit in the shifter 1. In AU1, a addition of (AO x BO) + (A1 x B1) is carried out. In AU1, a addition of (0 + 
0) is effected, and further another addition of (AO x BO) + (A1 x B1) + 0 is performed in AU3. As a result, 
an accumulation value of two single-multiplication-resultant-data is obtained. Then, this accumulation value 
is furthermore accumulated in the post-staged calculating unit 6 by M/2 times repeatedly by way of the 

25 process defined by the step ST 333. Thus, the sum of products containing M pieces of data are executed. 
In this case, a required calculation amount becomes 0.5 machine cycles per 1 output data, which is at a 
speed of two times higher than that of the conventional calculation. 

30 (4). A single precision complex number calculation. 

In Rg. 32, there are shown the operation contents of this calculation. In Rg. 33, there is shown a 
calculation flow thereof. In this case, it is assumed that a real number part multiplexed into the upper n-bits 
and an imaginary number part multiplexed into the lower n-bits of data have been stored in the data 

35 memory 4. Thus, after the initialization defined by the step ST 341 has been effected, the complex number 
calculating process of step ST 342 is performed as follows. As shown in Rg. 32, a 2n-bit left shift operation 
is performed in the shifter 4 for the multiplication result (AO x BO); a shift value "0" is processed for (A1 x 
BO) in the shifter 3; a shift value "0" is processed for (AO x B1) in the shifter 2; a 2n-bit left shift operation is 
performed for (A1 x B1) in the shifter 1. Then, a subtraction of (A1 x B1 - AO x BO) is performed in AU1, an 

40 addition of (A1 x BO + AO x BO) is effected in AU2, and another addition of (A1 x B1 - AO x BO) + (A1 x BO 
+ AO x B1) is carried out in AU3. As a result, the resultant data is obtained in such a form that the real 
number part of the complex number multiplication result is multiplexed into an upper 2n-bits, and the 
imaginary number part thereof is multiplexed into a lower 2n*bits. In this case, a required calculation amount 
becomes 1 machine cycle per 1 data, which is five times higher than the conventional calculation speed. 

45 

(5). A binary tree retreive vector quantization calculation. 

Rg. 34 shows an operation contents of this calculation, and Rg. 35 represents an operation flow thereof, 
so In this case, it is assumed that one of two pieces of input data has been stored in the data memory 4 by a 
. multiplexed format every element of the binary tree search vectors. 

An element of vector n y 0 n is stored in the upper n-bits of one input data "A", an element of vector "yi 
is stored in the lower n-bits thereof; and an element of vector "x" is stored in the lower n-bits of the other 
input data n B*\ Thus, in a step ST 352, a "0" shift is performed for the multiplication result (AO x BO) in the 
55 shifter 4; a "0" shift is performed for the multiplication result (A1 x BO) in the shifter 3; a "0" set is done in 
the shifters 2 and 1. Also, an output from the shifter 4 is subtracted from an output from the shifter 3 in 
AU1; the output from the shifter 3 is added to an output from the shifter 2 in AU2; and an output from AU1 
is added to an output from AU2 in AU3. As a result, the resultant data (yoi x xi) - (yi 1 x xi) of the multiplier 
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circuit 303 is obtained, an accumulation is performed in the post-staged calculating unit 6, and this 
accumulation is repeated by k times corresponding to the element number, so that the toi.ow.ng resuitant 



data are obtained: 



w 



k 

I [ (y oi * * ± ) - (y u * V ) 

i=l 

k k 

■ I ( y 0 i x x ± ) - I S y u x x i ) 

= d 0 " d i . . . • ( 13 ) 



75 



where 

do- an inner product between the reference vector "yo" and input vector^x". 
d • an inner Droduct between the reference vector "y," and input vector x . 

?hen n a step ST 353. a matching decision is made by judging whether or not the above-descnbed 
20 accumulated value (do - d,) is positive or negative. Thus, a required calculation amount per one stage 
becomes Tk + Trn^ne cycles which are approximately at a speed of two times higher than that of the 

C Tr o n u^b P enl U d that in the above- described preferred embodiment. AIM to AU3 are emp.oyed as 
the arithmetic calculators, and a mere adder may be utilized for AU2 and AU3. 

25 A description of a fourth preferred embodiment according to the mvenf on w.ll now be made. In fig 37 
it shou^dT noted that same reference numerals wi.l be employed for denoting the same or .rn.hr crcu.t 
Pigments shown in Fiq. 1 1 , and a further explanation thereof is omitted. 

Tf 9 37 Preference numeral 411 indicates a data decision device; 412 denotes an «*?WP*« 
connecting the data bus 5 and data decision device 411; and 413 denotes an output- path from the 

30 raicuiatina unit 6 to the data decision device 411. . m • 

^ M is a b o Ck diagr am of an internal arrangement of the above-described data dec.s.on dev.ce 4 £ 
in flo 38 reference numeral 415 is a threshold register group; 417 is a comparator group for conpanng the 
rl^'Jl dS with Z ^threshold values; 41 9 represents a condition decision device for judg.ng the reg.on 
TS^S^^SS^S» comparator output so as to compare a branch condition with the 

35 iS^^OtU^ a condition register for holding the branch conddon and address mdex 
frfo mTtio ^ Sicating a destination; 424 indicates an address register file for holdmg a plural. ty of 
Z^^ ^Les corresponding to the conditions of the condition register; 412 represents an 
input/output path and reference numerals 413. 414. 416. 418. 421 and 422 are output paths^ 

Ra 39 is a block diagram of an internal arrangement of the condition dec.s.on dev.ce 419 In Rg. 39 

40 reference ^numeral 426 indicates a region decision circuit; 428 is a condition companng crcurt; and 
rAfersnce numerals 418, 421. 422 and 427 are output paths. 

"TJEET- no» be describe*. In Fig. 37. the data *c*n de*e 4,1^ d-»* 
comoared which is input Irom the calculating unit 6 via the output pa* 413, «ith n pieces ot "irasnora 
,a3wMch aVe supplied from me previously set threshold value register group 415 via the output paths 
45 4? 6 Tn So^paS «7, and judges the data region of the data in gueston in the condwn d.c,s,on 
teZ IT* "n- Pieces o? comparison ,esu,,s (comparison result is represented by on. b,t data 

ao ~ a9 divides, the comparator outputs and region decision are shown. In th,s case, a spec.no M is set 
" K SSX£^^SZm^ decision device 4,9 ]udges th. region o. the dau based 

M T~^™T.r.o^a. o. »e cordon eigne, steed * *e condHon regiaer 420. in 
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Fig. 41, symbols f 0 to f* denote a region "0" designation fiag to a region 4 designation flag, each of which 
becomes n 1 n at the designation, and "0" at the non-designation. A plurality of conditions 1 to m can be 
designated, a priority order of the conditions to be compared is set, and these conditions are sequentially 
compared. Once a condition is satisfied, an address index signal is output from the condition decision 
5 device 419 via the output path 422. 

Into the address register file 424, a plurality of branch addresses corresponding to the respective 
conditions have been stored, and the branch address signal is output to the output path 414 based upon the 
address index signal supplied from the condition decision device 419 via the output path 422. As a 
consequence, based upon the output address value, the control circuit 3 performs the branch operation by 
io setting a count value of the program counter built therein to this address value. 

In case that all of conditions are not satisfied, the above-described address index signal is "OFF", and 
also the address signal output fron the address register files 424 is "OFF", and the count value of the 
program counter points a next instruction address. 

As to the data supplied from the calculating unit 6 via the output path 413, which is to be checked by 
75 this data decision device 411, one of outputs from the arithmetic calculator, multiplier, accumulator within 
the calculating unit 6 is defined by the instruction such as the mode setting operation, and a check is made 
by the data decision device 41 1 every machine cycle, so that a loss of the processing time required for 
comparing the data with the data regions can be prevented. 

Fig. 42 represents a continuous processing flow containing an intermediate check. First, initializations 
20 such as a selection of objects to be compared, a threshold value data set, a branch, address set, and a 
branch condition set, are performed (step ST 401). Then, both calculation process and condition decision 
process are repeatedly performed in parallel via the process data loop by the number of processing data, 
and the addresses A to C are output when the conditions 1 to 3 are satisfied. 

Referring now to drawings, a fifth preferred embodiment of the present invention will be described. 
25 Fig. 43 is a block diagram of the digital signal processor according to the fifth preferred embodiment of 
the invention. It should be noted that in Fig. 43, same reference numerals will be employed for denoting the 
same or similar circuit elements shown in Fig. 13, and therefore, no further explanation is made. In Fig. 43, 
reference numeral 516 represents a register preserving memory for preserving the data stored in the 
respective registers during the execution of the interruption; 517 is a repeat flag register (rfr) for 
30 representing that the repeat instruction is under execution; 518 represents a repeat flag stack (rfsk) 
functioning as a memory for preserving data when the interruption is accepted, 519 denotes a rear repeat 
counter (rch) for holding a number of an initial value of repeating; and 520 indicates an interrupt enable 
controlling unit for performing an automatic disable process of interruptions when the interruption is 
initialized. 

35 The register preserving memory 516 holds properly register values of registers needed to preserve for 
an interrupt processing routine. And, the interrupt enable controlling unit 520 inhibits automatically a H/W 
interrupt during an access to the external data memory and during executions of a branch instruction, return 
instruction, and S/W interrupt instruction. 

Referring now to Fg. 43, a H/W interruption process operation will be described. When the interruption 
40 is demanded in an external device, the external device announces an occurrence of the interruption request 
to the interruption controlling unit 513 in response to the interruption request signal 514. 

Upon accepting the interruption, the interruption request is output from the interruption controlling unit 
513 to the sequence controlling unit 505. Upon receipt of this interruption request, a non-operation 
instruction is set to the instruction execution controlling unit 3, and the update operation of the program 
45 counter 2 is prohibited. 

Thereafter, an interruption acknowledgement signal 515 is sent from the interruption controlling unit 513 
to the external device, and in principle, the HA/V interruption is prohibited during the interrupt operation. 

It should be understood that it is substituted by such an instruction that no operation is made in the 
sequence controlling unit 505. Other interruptions than the interruption under processing, e.g., executions on 
so the memory wait cycle during access of the external data memory, and also decoding of branch 
, instructions, return instructions, and S/W interrupt, are automatically disabled by the interrupt enable 
controlling unit 520. 

Upon receipt of the interruption instruction, the non-operation instruction is set to the instruction 
execution controlling unit 3, the count value of the program counter 2 is automatically pushed in the PC 
55 stack 504 and also an interrupt address is set to this program counter. 

In case of the interrupt operation during the repeat operation, it is furthermore required to store a 
condition of a repeat flag register 517. The register value of the repeat flag register 517 is automatically 
preserved into the repeat flag stack 518 in order to accept the interruption instruction even during the 
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6X6 ^ZSXES^ lister values of the registers used in the interrupt processing routine, 
is Jrrfed ourat L^estinated interrupt address by a register preserving instruction (push). The return 
operation from the interrupt operation is effected in response to a return ,nstruct.on (rt,)- Before the 
S ion nh^TlwSucfcn the register values obtained before the interrupt process rout.ne are ^set 
fnto 1 respective registers in response to a register value return instruct.cn (pop) at the desfnated 

'"'The^rL return instruction is executed to return from the interrupt operation. In this case, the 
count ZeoU^Z^ counter 2 is poped from the PC stack 504, the non-operation in^uct,on .s se o 
the instruction execution controlling unit 3. and thereafter, the register value of the repeat flag reg.ster 517 « 

T r«3?cJ2 rexpTaLg the norma, interrupt operation. F,g. 45 is another timing chart for 
value of the rear repeat counter 51 9 so as to perform the repeat setting operation 

nnt it eaual to "0" If zero, this instruction is performed. 

repeal" un,ef 9 is prLrved to the register preserving memo-, 5-6 in response «o the register preservmg 
'"Tetore the interrupt process routine is accomplished, both the preserved count value of to repeat 

e^Soton controlling unit 3. Thereafter, the data before the atemjption ,s poped from the repeat flag stack 

"'m^-SIE "ZZZL set in to repeat .tag •» — — °< «• — ' 

o cnhtr^ hv "1 n to become "1 n and the repeat instruction is again executed. 

,n ^e elema St operation, the processor can be completely returned by processmg the 

^TsTulfbe noted that the repeat operation number was four in the above-described preferred 
JZ^*££ « a motion compensation casting method according » "S^^'^ * 

Z?n vectr? dL££ » .o be emulated a. a c**se art,. A, «s time, it ,s assumed 
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that a total number of these motion vectors 611 to be distortion-calculated is "e". An amount of inter-block 
distortion "d q " (q = 1 to e) between the block of the position of this motion vector 61 1 and the presently 
input block 603 is calculated (step ST 601) and a total thereof is assumed as an intra-region distortion 
amount "Df (j = 1 to R) of this search small-region 610. 

In this case, since the following equation (14) is satisfied, i.e., 

e e L 

Dj « I dq = I I | xp - yiP | .... (14) 
q=l q=l P=l 

a calculation amunt per one search small-region 610 is expressed in units of machine cycle as follows: 
(exLxa) (15) 

The above-defined calculation is carried out over alt the search small-regions 610 so as to detect a 
minimum distortion region 612 having a minimum intra-region distortion amount "D min" (step ST 602). At 
this time, a calculation amount is equal to: 
[(e x L x a) x R + R x b] (16) 

Then, as illustrated on the moving vector detecting step in Fig. 46(b). the limited search range 613 
having a size of K1 x K2 with the minimum distortion region 612 obtained in the region decision step as a 
center is set, and the motion vectors to be searched at the higher density are positioned within this search 
range 613 (step ST 603). A calculation amount within this limited search range 613 is obtained by summing 
the following items (17) and (18). 
[<k1 xk2)xLxa] (17) 
(k1 x k2) x b (18) 

The item (18) is obtained by the comparison process. 

Assuming that the total number "R" of the search small-regions 610 is equal to nine (9), the number 
"e" of the motion vectors 611 to be calculated within the search small-region 610 is equal to four (4); and 
the values of k1 and also k2 in the limited search range 613 are equal to six (6), total calculation amount is 
defined in units of machine cycle as follows: 
S = [ (e x L x a) x R + R x b ] + (k1 x k2) 
x L x a + (k1 x K2) x b 
* 4,800 (19) 

As a consequence, the resultant calculation amount is reduced to approximately 1/4 of a calculation amount 
of full searching. 

It should be noted that although the range limitation by the searching operation at a low density was 
one stage in the above-described preferred embodiment, a plurality of stages may be utilized for limiting 
the search ranges. 

Also, the difference absolute value summation was utilized for the distortion calculation in the above- 
described preferred embodiment, a difference square summation may be utilized. 

Claims 

1. A digital signal processor comprising: 
an instruction memory for previously storing control means to instruct various internal operations as an 
instruction word; 

an internal data memory for storing calculation data; 

a calculator for performing various calculations on at least one data read from the internal data memory in 
accordance with the instruction word read from the instruction memory; 
an accumulator for accumulating an output from the calculator; 
an accumulating register for holding an output from the accumulator; 
" a minimum distortion register for holding a minimum distortion; 
a minimum distortion position register for holding a number of a block having said minimum distortion; 
a block counter for holding a number of a block under distortion calculation; 

a comparator for comparing an output value of the accumulator with a value of said minimum distortion 
register every cycle while, in order to detect the minimum distortion among M blocks (M being a positive 
integer) of a data series, the distortion calculation is performed on a k-th block (1 ^ k £ M, "k" being an 
integer) of M blocks of the data series; and. 



EP 0 373 291 A2 



10 



15 



an instruction execution controlling unit for executing the control means upon decoding the instruction word 
supplied from the instruction memory. jnc|udes m means 

in which an accumulafon '^f™^*^^^ tne pro cess is advanced to an instruction 



said control means. 



5. A digital signal process* composing: , n ,. ma| opo „ fcra; 

s memop/; mitnut Hata in Darallel to said calculation unit; 

*ddef. bos in response to values output from said address generating unrt. 

memory connecting unit; and, AII * niI «inn the data in units of block between 

•JZZ dT^rnS::^"^, ~ memorv bus. in 

S^I*. o. - m—, opera.cn control,* b» said ~£ ^r " dudes an address 

» 6. A digital signal processor as claimed «, ctan j ^^^J ^ fi(st „ 0 de outputang 
output mode unit for holding information to select one of lira • d second ^ 

both an upper address and a leer address to ^"^'"X to»g seleclion information 
outpumng on,, me lower address thereto; and ™1„ and itself 

a ^r^ZZ^T^S"^ data memo, transte, centre, unit 

TSZw*** Size m*m connected to said data bus for representing a horizon*, size of a w, 
7£SZ££?Z%*~ .or representing a horizontal «. of a rectangular po*n in the *» 

45 dimensional address space; aWrir -o« n f sa jd external data memory: 

a block start address register for .nd.cat.ng a head address oU ^ d ^ rnal f d . ^ data memory; 

a DMA control unit for performing a <^™"*»» f^^'oMA address calculation unit includes: 

and internal memory address register. jndudes: 

an irSff^^^^S- = » se*ct one o, firs, and second modes in 
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response to data supplied from said data bus, said first mode outputting both an upper address and a lower 
address to said external data memory and said second mode outputting only the lower address thereto; 
and, 

a memory connecting unit for holding selection information whether or not a read/write completion signal is 
handled between said external data memory and itself. 

10. A digital signal processor as claimed in claim 9, wherein said external data memory connecting unit 
includes: 

changing means for inputting/outputting data between said direct data transfer bus and said external data 
memory in case that the address of said external data memory is designated by said direct data memory 
transfer control unit, and for inputting/outputting data between said external data memory and one of both 
the data bus and the data output bus when the address of said external data memory is designated by said 
address generating unit. 

11. A digital signal processor as claimed in claim 10. wherein said external data memory connecting 
unit further includes: 

mode setting means for changing an address mode output to said external data memory in accordance with 
a set value of said mode register or said address output mode unit of the DMAC register, and for changing 
such a condition that the read/write completion signal is input or not in response to said set value of said 
mode register or said memory connecting unit of the DMAC register. 

12. A digital signal processor as claimed in claim 11, wherein said external data memory connecting 
unitincludes: 

means for interrupting data input/output operations between said direct data transfer bus and itself in case 
that a request to access to said external data memory is generated by said calculation unit during 
inputting/outputting data between said external data memory and said direct data transfer bus, and for 
executing inputting/outputting data between said external data memory and one of both said data bus and 
said data output bus. 

13. A digital signal processor including a program memory for storing a microprogram; a control circuit 
for performing a fetch of said microprogram in said program memory, decoding, data reading, a calculation 
and writing of a calculation result in parallel pipelining; a data memory capable of storing 2n-bis data-sized 
data, and simultaneously reading out two data; an address generating unit for generating addresses for said 
data memory; a multiplier circuit for performing a multiplication, addition or subtraction between the two 
data read simultaneously from said data memory; a calculation unit for performing an arithmetic calculation 
or accumulation with respect to said two data or resultant data of said multiplier circuit; and, a data bus for 
transferring said two data and the resultant data from said calculation unit, wherein said multiplier circuit 
comprises: 

a first register and a second register for holding one and the other of said two data respectively; 
a first multiplier to a fourth multiplier provided in accordance with four combinations respectively among two 
upper-side bits and two lower-side bits of the two data held in said first and second registers, for performing 
four multiplications respectively in parallel; 

a first shifter to a fourth shifter provided in accordance with said first to fourth multipliers respectively, for 
performing four shift or zero-set processes respectively in parallel in response to said microprogram as to 
the respective resultant data from said first to fourth multipliers; 

a first arithmetic calculator for receiving both outputs from said first and fourth shifters to perform an 
addition or subtraction process in response to said microprogram; 

a second arithmetic calculator for receiving both outputs from said second and third shifters to perform an 
addition or subtraction in response to said microprogram; and, 

a third arithmetic calculator for receiving both outputs from said first and second arithmetic calculators to 
perform an addition or subtraction in response to said microprogram, so as to supply 4n-bit output data to 
said calculation circuit 

14. A digital signal processor as claimed in claim 13, wherein said microprogram includes: 

control means for controlling said first to fourth shifters and said first to third arithmetic calculators in 
, response to respective sorts of calculations to be executed. 

15. A digital signal processor as claimed in claim 14, wherein said control means includes: 

double precision multiplying means for shifting contents of said second and third shifters to the upper side 
by lower-side bits, for shifting a cotent of said first shifter to the upper side by a data length, and for 
instructing said first to third arithmetic calculation units to perform addition processes respectively. 

16. A digital signal processor as claimed in claim 14 or 15, wherein said control means includes: 
single precision parallel multiplying means for setting all contents of said second and third shifters to zero, 
for shifting a content of the first shifter to the upper side by the data length; and for instructing said first to 
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third arithmetic calculation units to perform addition processes respectively^ 

17 A digital signal processing processor as claimed in any one of claims 14 to 16, wherem sa.d control 

Fsrttws Mir: t ss sr: 

'TfTdlflM signal processng processor as claimed in any one of claims 14 to 17, wherein said control 

in^ng sTd It arithmetic caption unit to perform a subtraction process, sad second and th.rd 

f^tree search vector Ration « .mean, , - « • - J ^ ^ ™* 

upper side bits of said first renter, a ^J*^^ of slid second register, for setting all 
register, and a third vector element .set to he lowe s^s rf. ^9 ^ ^ tQ 

0 contents of said first "d-cond ^^J^J^Jt arithmetic calculating units to perform 
?ZS^£S?Z£2~ *, an — con.0, o, a tet Ced »— « 

- ESSE 



by said instruction word; 
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a data memory for storing calculation data; 

a calculation unit for performing a calculation designated by said instruction word for said calculation data 
so as to output a calculation result and a condition; 
a program counter for holding an instruction address; 
s a PC stack for preserving a count value of said program counter while the interruption operation is 
executed; 

a repeat counter for counting a repeat number during an execution of a repeat instruction; 
a repeat flag register for instructing that said repeat instruction is under execution; 
a repeat flag stack for preserving a register value of said repeat flag register during the interruption 
w operation; 

a register preserving memory for preserving the respective register values and said count value of the 
repeat counter during the interruption operation; 

an interrupt controlling unit for outputting an interrupt request signal to said instruction execution controlling 
unit upon^ receipt of an interruption operation, and for outputting an interrupt permit signal to an interrupt 
75 destination; and, 

an interrupt enable controlling unit for prohibiting a H/W interrupt operation during a wait cycle of external- 
data-memory accessing, and during decoding/executing an instruction of branch, return, or S/W interrupt. 

24. A digital signal processor as claimed in claim 23, wherein said instruction execution controlling unit 
includes; 

20 pipeline controlling means which contains: 
a first stage for fetching said instruction word; 

a second stage for decoding the instruction word fetched by said instruction execution controlling unit; and, 
a third stage for outputting a calculation result and a condition of said calculation unit based upon data 
obtained by decoding said instruction word. 
25 25. A digital signal processor as claimed in claim 24, further comprising a rear repeat counter where the 
count value is subtracted at said second stage. 

26. A motion compensation calculating method for detecting a block and a motion vector of the block 
having a minimum distortion obtained by calculating an inter-pattern analogy between each of blocks in a 
previously input frame and respective blocks of digital image data of a presently input frame, said blocks 

30 into which said presently input frame of the digital image data composed of a plurality of frames 
sequentially input in a time series, is devided, said method comprising the steps of: 

setting a first motion vector search range of which size is predetermined and of which center is located at a 
position of an input data block to be encoded within the previously input frame; 
equally subdividing this first search range into a plurality of regions; 
35 arranging a group of first search motion vectors of n in number (n being a positive integer) in the respective 
regions at a coarse density; 

claculating, as an intra-range-distortion value, a sum of distortion values, each of which representing an 
inter-pattern analogy between the input data block and a block data of a position pointed by the respective 
motion vectors of n in number; 
40 detecting a region where the intra-range-distortion value becomes minimum within the first search region 
setting as a minimum distortion region, a region where a distortion amount with in this region becomes 
minimum; 

setting, a second motion vector search range of which size is smaller than that of the first search range and 
of which center is located at a position of the minimum intra-region distortion value region; 
45 arranging a group of second search motion vectors at a higher density within the second search range; and, 
detecting a most analogous block to the input data block through a minimum distortion calculation based 
upon the group of second search motion vectors, whereby both the block with this minimum distortion and 
the motion vector thereof can be used as a final prediction signal and a final motion vector respectively. 

27. A motion compensation calculating method as claimed in claim 26, wherein the distortion calculation 
so is carried out by a difference absolute value summation. 

28. A motion compensation calculating method as claimed in claim 26, wherein the distortion calculation 
is carried out by a difference square summation. 
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